#loading packages
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.5     ✓ stringr 1.4.0
## ✓ tidyr   1.1.2     ✓ forcats 0.5.0
## ✓ readr   1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()
library(ggridges) # for joy plots
library(plotly) 
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(gganimate)     # for adding animation layers to ggplots
library(gifski)        # for creating the gif (don't need to load this library every time,but need it installed)
#loading data
spotify <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   track_id = col_character(),
##   track_name = col_character(),
##   track_artist = col_character(),
##   track_album_id = col_character(),
##   track_album_name = col_character(),
##   track_album_release_date = col_character(),
##   playlist_name = col_character(),
##   playlist_id = col_character(),
##   playlist_genre = col_character(),
##   playlist_subgenre = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
spotify_rap <- spotify %>% 
  filter(playlist_genre == "rap")

randb <- spotify %>%
  filter(playlist_genre == "r&b") %>%
  select(-track_id, - track_album_id, -playlist_id, -playlist_name) %>%
  filter(track_popularity >= 75)

Introduction & Background

Why did we do an analysis on spotify? Why is the data significant & why should people care? Introduce the data to audience

Using this dataset, we hope to study to technicalities of music anbd

Aside from personal interest…

Data Collection

Data retrieved from github, (add link). https://github.com/rfordatascience/tidytuesday/blob/faca0b6bd282998693007c329e3f4b917a5fd7a8/data/2020/2020-01-21/readme.md Who collected the data and what prupose does it serve? Who funded the data collection? Any possible biases? What are teh implications of the analysis of this dataset, ethical or otherwise?

How has the popularity of genres changed over time?

genre_pop <- spotify %>%
  filter(track_popularity >= 75) %>%
    mutate(ymd_release = ymd(track_album_release_date),
         year = year(ymd_release)) %>%
  group_by(year, playlist_genre) %>%
  summarize(avg_popularity = mean(track_popularity)) %>%
  ggplot(aes(x = year, y = avg_popularity, color = playlist_genre)) +
  geom_point() +
  labs(title="Average song popularity by genre per year",
       subtitle = "Overall, as music becomes more accessible, average peopulatity across all genres is on the rise.",
       x = "",
       y = "",
       color = "Genre") +
  theme_classic()
## Warning: Problem with `mutate()` input `ymd_release`.
## ℹ  68 failed to parse.
## ℹ Input `ymd_release` is `ymd(track_album_release_date)`.
## Warning: 68 failed to parse.
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
ggplotly(genre_pop)
prelim_graph <- spotify %>%
  ggplot(aes(y = playlist_genre, x = track_popularity)) +
  labs(title = "Song Popularity by Genre",
       x = "", y = "",
       subtitle = "Song popularity is measured from 0-100, with higher numbers being indiciative of more popularity.\nHighest median popularities belong to pop and latin with an overall median popularity of 40",
       caption = "Alex Ismail, Malek Kaloti, Brian Lee") +
  theme_classic() + 
  theme(plot.title.position = "plot",
        plot.title = element_text(size = 20, face = "bold"),
        plot.subtitle = element_text(size = 10, face = "italic")) +
  geom_boxplot() +
  geom_vline(aes(xintercept = median(track_popularity, na.rm = TRUE)), color = "blue") 

prelim_graph

Analysis!

Rap

Rap is a particularly fascinating genre to investigate using the Spotify data to look at what traits of music have correlated with popularity as the genre has undergone several changes in audience and style. Though a relatively new genre arriving on the greater music scene in the 80s, rap has undergone a myriad of trends and style variations. Fans of old school rap from the 80s and 90s may have distaste for today’s artists like Drake and Eminem for having modernized the genre too much. Fans of modern rap may get bored of the authentic sound of artists like Run-DMC or Tupac. Are there trends that tie all of rap together as to what makes a song popular?

Song Quality

The first and most natural observations to make are on overarching metrics that Spotify provides. Using the descriptions provided, I was most interested on the following values in correlation to track popularity: Danceability due to rap’s heavy emphasis on rhythm and beats, Energy due to some artists’ signature style of shouting to “hype” up a crowd (ie. Lil Jon, DMX), the inverse variables of Speechiness/Instrumentalness due to other artist’s signature of rapping as fast as possible (ie. Eminem, Busta Rhymes), and Valence for the perceived association between rap and violence, drugs, and focus on other less-than-righteous topics.

## `summarise()` regrouping output by 'Stat1' (override with `.groups` argument)
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Energy, Rounded_Speechiness, Rounded_Instrumental, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 1: Stat1 = "Rounded_Danceability".
## Warning: Unknown levels in `f`: Rounded_Energy, Rounded_Speechiness,
## Rounded_Instrumental, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Speechiness, Rounded_Instrumental, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 2: Stat1 = "Rounded_Energy".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Speechiness,
## Rounded_Instrumental, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Energy, Rounded_Speechiness, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 3: Stat1 = "Rounded_Instrumental".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Energy,
## Rounded_Speechiness, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Energy, Rounded_Instrumental, Rounded_Valence
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 4: Stat1 = "Rounded_Speechiness".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Energy,
## Rounded_Instrumental, Rounded_Valence
## Warning: Problem with `mutate()` input `Stat`.
## ℹ Unknown levels in `f`: Rounded_Danceability, Rounded_Energy, Rounded_Speechiness, Rounded_Instrumental
## ℹ Input `Stat` is `fct_recode(...)`.
## ℹ The error occurred in group 5: Stat1 = "Rounded_Valence".
## Warning: Unknown levels in `f`: Rounded_Danceability, Rounded_Energy,
## Rounded_Speechiness, Rounded_Instrumental

Oddly, the biggest conclusion I drew from this graph was not any positive or negative correlation, but a lack of connection between valence and popularity. For a genre that has a reputation for being connected with gangs, guns, drugs, etc., there is a complete lack of correlation between valence and popularity. Beyond that, there is a moderately strong correlation between popularity and danceability, as I had expected based on the prevalence of beats and rhythms in rap. The energy line shows that the highest percentage of songs to become popular are ~.5 energy, which likely suggests too much energy can take away from the popularity of a song. Finally, the speechiness/instrumentalness variable shows that songs on the extreme end of speechiness (.8+) are most likely to be popular.

R&B

Why R&B?

In this section, I want to take a closer look at one of my favorite genres of music, R&B. I think I love it so much because it’s often good music to unwind to – it’s smooth, slow, and relaxing. I also love its versatility! R&B can fit the mood of anything from a gloomy, rainy day to a bright, sunny day. But why? What characteristics make R&B such a great genre to listen to? Using the Spotify dataset and some visualizations which look at the specific characteristics of the most popular R&B songs (songs with a popularity rating of above 75), I hope to come closer to answering these questions.

randb %>%
  select(track_name, track_artist, playlist_genre, playlist_subgenre, track_popularity, danceability, energy, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, duration_ms) %>%
  arrange(desc(track_popularity)) %>%
  head(12) %>%
  knitr::kable() 
track_name track_artist playlist_genre playlist_subgenre track_popularity danceability energy loudness mode speechiness acousticness instrumentalness liveness valence duration_ms
ROXANNE Arizona Zervas r&b urban contemporary 99 0.621 0.601 -5.616 0 0.1480 0.05220 0.000000 0.4600 0.457 163636
ROXANNE Arizona Zervas r&b hip pop 99 0.621 0.601 -5.616 0 0.1480 0.05220 0.000000 0.4600 0.457 163636
The Box Roddy Ricch r&b urban contemporary 98 0.896 0.586 -6.687 0 0.0559 0.10400 0.000000 0.7900 0.642 196653
Memories Maroon 5 r&b urban contemporary 98 0.764 0.320 -7.209 1 0.0546 0.83700 0.000000 0.0822 0.575 189486
Blinding Lights The Weeknd r&b urban contemporary 98 0.513 0.796 -4.075 1 0.0629 0.00147 0.000209 0.0938 0.345 201573
Blinding Lights The Weeknd r&b hip pop 98 0.513 0.796 -4.075 1 0.0629 0.00147 0.000209 0.0938 0.345 201573
The Box Roddy Ricch r&b hip pop 98 0.896 0.586 -6.687 0 0.0559 0.10400 0.000000 0.7900 0.642 196653
Tusa KAROL G r&b hip pop 98 0.803 0.715 -3.280 1 0.2980 0.29500 0.000134 0.0574 0.574 200960
Memories Maroon 5 r&b hip pop 98 0.764 0.320 -7.209 1 0.0546 0.83700 0.000000 0.0822 0.575 189486
Circles Post Malone r&b hip pop 98 0.695 0.762 -3.497 1 0.0395 0.19200 0.002440 0.0863 0.553 215280
Don’t Start Now Dua Lipa r&b urban contemporary 97 0.794 0.793 -4.521 0 0.0842 0.01250 0.000000 0.0952 0.677 183290
everything i wanted Billie Eilish r&b urban contemporary 97 0.704 0.225 -14.454 0 0.0994 0.90200 0.657000 0.1060 0.243 245426

Above are the top 10 most popular songs in the R&B genre (12 songs were pulled from the dataset to account for 2 songs that were each in 2 different subgenres – Arizona Zeravas’ Roxanne and The Weeknd’s Blinding Lights. We can see that all of them were released in 2019 and all categorized under my two favorite two subgenres of R&B, Urban Contemporary and Hip Pop. All of them also boast a danceability score of above 0.5, with most of them (with the exception of Maroon 5’s Memories and Billie Eilish’s everything i wanted) having energy scores of above 0.5. We can also see that across the board, all 10 songs have low speechiness and instrumentalness scores (with the exception of Billie Eilish’s everything i wanted. Interestingly, all of the songs fall within a valence of 0.2-0.6. The other characteristics are quite varied. So, for the purposes of my analysis of the R&B genre, I will only focus on the song characteristics that have clear trends across the genre – danceabiility, energy, speechiness, instrumentalness, and valence.

What now? Why is our analysis important?

As it becomes easier to produce and release music from one’s own bedroom and streaming platforms such as Apple Music and Spotify increasingly making music accessible to everyone, we believe our analysis has important implications which can help listeners find new songs that they like and help platforms build algorithms that give better and more relevant song recommendations to its users.

A disclaimer: Correlation does no equal causation.

Of course, carrelation does not equal causation. Just because the

A Conclusion

Thanks to streaming platforms such as Spotify and Apple Music, small creators are also given a platform for creative release. Our analyses of pop, rap, and R&B, can also help small artists grow their own platforms to cater to the interests of specific audiences. In a time such as now when the consumption of art (whether it be in the form of movies, music, or television), is essential to one’s mental wellbeing, our analysis can help boost these efforts. By asking the question, “What makes a song in a given genre popular?” We have taken a close look at the specific characteristics of songs with a popularity rating of 75 or higher.